k-Anonymous Microdata Release via Post Randomisation Method
نویسندگان
چکیده
The problem of the release of anonymized microdata is an important topic in the fields of statistical disclosure control (SDC) and privacy preserving data publishing (PPDP), and yet it remains sufficiently unsolved. In these research fields, k-anonymity has been widely studied as an anonymity notion for mainly deterministic anonymization algorithms, and some probabilistic relaxations have been developed. However, they are not sufficient due to their limitations, i.e., being weaker than the original k-anonymity or requiring strong parametric assumptions. First we propose Pk-anonymity, a new probabilistic k-anonymity, and prove that Pk-anonymity is a mathematical extension of kanonymity rather than a relaxation. Furthermore, Pk-anonymity requires no parametric assumptions. This property has a significant meaning in the viewpoint that it enables us to compare privacy levels of probabilistic microdata release algorithms with deterministic ones. Second, we apply Pk-anonymity to the post randomization method (PRAM), which is an SDC algorithm based on randomization. PRAM is proven to satisfy Pk-anonymity in a controlled way, i.e, one can control PRAM’s parameter so that Pk-anonymity is satisfied. On the other hand, PRAM is also known to satisfy ε-differential privacy, a recent popular and strong privacy notion. This fact means that our results significantly enhance PRAM since it implies the satisfaction of both important notions: k-anonymity and ε-differential privacy.
منابع مشابه
SLOMS: A Privacy Preserving Data Publishing Method for Multiple Sensitive Attributes Microdata
Multi-dimension bucketization is a typical method to anonymize multiple sensitive attributes. However, the method leads to low data utility when microdata have more sensitive attributes. In addition, the methods do not generalize quasi-identifiers, which make the anonymous data vulnerable to suffer from linked attacks. To address the problems, the paper proposes a SLOMS method. The method verti...
متن کاملAn efficient hash-based algorithm for minimal k-anonymity
A number of organizations publish microdata for purposes such as public health and demographic research. Although attributes of microdata that clearly identify individuals, such as name and medical care card number, are generally removed, these databases can sometimes be joined with other public databases on attributes such as Zip code, Gender and Age to reidentify individuals who were supposed...
متن کاملAnonymity: Formalisation of Privacy – k-anonymity
Microdata is the basis of statistical studies. If microdata is released, it can leak sensitive information about the participants, even if identifiers like name or social security number are removed. A proper anonymization for statistical microdata is essential. K-anonymity has been intensively discussed as a measure for anonymity in statistical data. Quasi identifiers are attributes that might...
متن کاملWorking Paper ENGLISH ONLY UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE (UNECE) CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN
The usual approach to generate k-anonymous data sets, based on generalization of the quasi-identifier attributes, does not provide any control on the variability of the confidential attributes within the k-anonymous groups. If the latter variability is too small, privacy is not sufficiently protected, while, for large variabilities, data utility is substantially damaged. Some refinements to the...
متن کاملA Survey on Privacy Preservation in Data Publishing
Privacy-maintaining data release is one of the most important challenges in an information system, because of the wide collection of sensitive information on the internet. A number of solutions have been designed for privacy-maintaining data release. This paper provides an inspection of the state-of-theart methods for privacy protection. The paper discusses novel and powerful privacy definition...
متن کامل